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THE GEOGRAPHIC BASIS OF THE DBS GEOCODING SYSTEM _ 


FOR URBAN AREAS: AN OVERVLEW* 
Ré sumé 


-L'auteur examine la base géographique du systéme de 
géocodage urbain du B.F.S. en vue de souligner l'importance du 
‘cadre spatial choisi pour la réalisation de l'objectif du systtme 
qui est de permettre la tabulation des données statistiques pour les 
régions délimitées 'ad hoc! par l'utilisateur. L'étude montre que 
trois éléments principaux du cadre spatial: le cdté d'filot, sa 
série d'adresses et ses coordonnées géographiques, exercent une 
forte influence sur chacune des phases suivantes du systéme: le 
géeocodage, le stockage des données et leur extraction. L'auteur 
en conclut que le développement de n'importe quel systéme d'informa- 
tion spatiale nécessite une recherche approfondie des éléments 


géographiques qui en fin de compte en contrélent 1l'usage. 
Abstract 


The geographic basis of the DBS urban geocoding system 
is examined with a view to underlining the importance of the system's 
spatial framework in the attainment of its objective; namely, special 
tabulation of data by ‘ad hoc' user-specified areas, The study indi- 
cates that three main elements of the spatial framework — the block- 
face, its address range and its geographic co-ordinates — exert a 
strong influence on the system's geocoding, data storage and data 
retrieval stages. The author concludes that the development of any 
spatial information system necessitates research into the geographic 


elements that ultimately control system utility. 


* This is a revised version of a paper drafted in November 1968. 
The author is grateful to his colleagues on the geocoding system team and elsewhere 
in DBS for their critical reading of the final draft and for their helpful comments. 
The author is, however, solely responsible for any errors or deficiencies in this paper. 
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‘1. Introduction 
/ <n Cee a Se 


1.1 The General Context 


Over the past two years the Dominion HateSuMse Statistics (DBS) has 
conducted research into defining the small-area information problem and into de- 
veloping a computerized system for its solution and for general utility. The 
outcome of this work is the Geographically Referenced Data Storage and Retrieval 
System, referred to throughout this paper as the DBS geocoding System. Two 
variants of the system are being developed, one for larger urban areas and tte 
other one, a compatible approach, for the remaining area of Canada. In this 
paper nie the system for urban areas is discussed, and the term ‘urban' in this 


context connotes an area suited to the existing street address conversion mode 


of geocoding. 


The small area EGTA on prow ee may be defined briefly, and ina 
geographical context, as a burgeoning demand for data pertaining to micro-areas; 
see of sub-municipality size, bearing little or no relationship to existing 
statistical area units, but nonetheless significant in the decision-making pro- 
cesses of management and planning, and in general conducive to applied research’ 
in the social and economic fields. (1) Increasingly, the spatial attribute of 
- data observations has been imposing a constraint upon the general usefulness of 
data. Haggett (1965) enunciated this constraint on data utility and referred 
to the problem of "the yoking of locational observations to the characteristics 
of the collecting area". Hagerstrand in Sweden sought as early as 1955 ''to 
investigate if a possibility exists to give to the space-aspect of statistical 
material tie denne neutrally objective status as the time-aspect always had" 
(Hagerstrand, 1955). Another problem with small-area information arises from 
the fact that much information is coded to areas such as census tracts often 
in a non-geographic or quasi-geographic format, thus making it impossible to 


(1) Further insight may be obtained from the series Census Tract Conference Papers, 
Bureau of the Census, Washington, D.C. See also Simmons, 1967; and US 


Department of Housing and Urban Development, 1968. 
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comprehend the location of one tract to another witout the aid of supplementary 
information, typically in the form of maps. (Kao, 1963) 

DBS accordingly formulated basic system objectives, of which the geo- 
graphic aspects were: 

(i) The geographical referencing of data observations within an ordered Spatial 
universe or framework, the basic areal units of which would satisfy singly 
or in aggregation the area requirements of data users (subject to con- 

_Straints of confidentiality and sampling and non-sampling errors associated 
with finely gauged areal units). 
(ii) Ancillary data display capabilities; that is, graphic capabilities such 
as computer mapping that would derive from geographically referencing data 


by means of an automated spatial information system. 


Several basic types of spatial framework were considered and, for urban 
areas, a street address conversion mode based on a ‘nominal grid' and a. 'geographic 


grid' was adopted for experimental development. (2) 


a2 The, Purpose.of the Study _ 


The purpose of this study is to articulate the geographic basis of the DBS 
geocoding system for urban areas and to identify geographic considerations that 
ensue from system implementation. Of necessity the treatment is introductory. (3) 


(2) Nominal grid denotes the street pattern and similar linear’ non-street fea- 
tures such as rivers, railways and boundaries. Geographic grid refers to a 
Cartesian grid having a known origin and scaled abscissa and ordinate axes 
that enable positions on the earth's surface to be reckoned in co-ordinate 


values. 


(3) The study of the geographic basis of the DBS geocoding system for urban areas 
is largely an 'untilled field'. Though this paper seeks to reveal the basic 
elements in the concept that might be considered geographic in character, it 
is felt at the outset that more questions may be raised than will have been 
resolved satisfactorily. 
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Neither the procedures for implementing the system nor the benefits 
accruing through geographic referencing will be examined. Introductions to these 
topics can be found in: Canada. DBS, 1968; Fellegi and. Weldon, 1967; and 


American Association of Geographers, 1964. 


2. The Geographic Basis of the System 
2.1 The Basic Concept 


In order to furnish users with data for small areas specified by them 
on an ‘ad hoc' ‘basis, a spatial framework is required Pera: of primary areal 
units to which data can be referenced or coded (hence "geocoding'), and from 
which, in aggregation, the query areas of Wate users can be constructed. Although 
the specific properties of the primary areal fe et vary according to the 
spatial framework and system adopted, there are at least three significant charac- 
teristics that they must exhibit in a computerized system designed to retrieve 
data for user-specified areas; (i) they must be small enough to function as 
building-blocks; (ii) they must be identifiable by a code; and (iii) the location 


of each one must be unique and specific within an ordered spatial universe. 


In urban areas DBS has approached the geographical referencing of data 
to primary nti tnitts by feds of a concept known generally as ‘street address 
conversion'.(4) The logic behind street address conversion as a technique for geo- 
graphical referencing Aude as follows: The statistical population for which data 
are collected may be identified in urban areas by an address (specifically a 
civic,i.e. municipal, house number ‘form of address, such as 1210 Carling Avenue, 
Ottawa, Ontario). Addresses are pre-grouped conveniently, in most cases, 

Ry PSHE eae conversion concept was introduced to the Dominion Bureau 

of Statistics by members of the Urban Data Center, University of Washington, 


‘Seattle. For the detailed development of the concept see Dial, 1964; 
Calkins, 1965; Crawford Jr., 1967. 
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into address ranges for block-faces(5) of city streets. A block-face is a 


meaningful areal unit to planners, administrators and researchers in general, 
and it is usually small enough that it could serve as the primary areal unit of 
a spatial framework. However, block-faces, while they may be a readily iden- 
tifiable and discernible element of the street pattern, are not provided with 
a location-specific identification on the basis of their address ranges alone. 
The street name and address of a data observation do not describe the absolute 


location of the observation within an '‘ordered' spatial framework, nor do they 


indicate relative location with respect to all other data observations. 


In order for block-faces to function as primary areal units within an 
automated spatial information system they must éach be provided with a location- 
specific identification. This-latter step is achieved by relating the nominal grid 
to ee chile prid system (Vance, 1966; US iiegccaticie, ae Housing and Urban Devel- 
opment, 1968). The Dominion Bureau of Statistics has adopted the 6° Universal 


Transverse Mercator Grid System for this purpose. 


Block-faces may be represented as a point, the location of which in 


a 


two-dimensional space is expressable as a set of co-ordinate values (x,y) within 
a geographic grid system. The co-ordinate identification of this point location 
is unique within the zone of the geographic grid system. The point that is chosen 


(5).The term 'block-face' is used to designate one side of a street between neigh- 
bouring or consecutive intersections. The block-face constitutes the primary 
areal unit of the system's spatial framework, though on occasion it may be 
split either to respect the presence of a statistical area boundary or to re- 
tain some semblance of an optimum size. ‘Address ranges' referred to in this 
study are the terminal civic house numbers for each block-face or split block- 
face. Their values are such that they enclose all individual civic house 
numbers in the primary areal unit. i 
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is the mid-point of the long axis of the block-face set back from the street 
centre line a prescribed distance. This micepoint is termed the block-face 
"centroid'.(6) The co-ordinate value for the penteeia of the block-face can be 
used as a code that can be assigned to each data observation occurring on the 


block-face, thus identifying the data and simultaneously placing it in an ordered 


Spatial framework. 


A 'conversion' can be effected between the nominal grid elements 
(street name and address range by block-face) and the geographic grid element 
(the centroid of the block-face expressed in co-ordinates). Data bearing a street 
address identification can be tested by computer against a file of address ranges 
and corresponding centroid values. Once the appropriate address range has been 
found for the address in question, its centroid value can be substituted Eotrhe 
Ae street addtess, and the data ear BE stored on the basis of this newly 
acquired identification. Geographical referencing or geocoding will have been 


accomplished. 


2.2 Geographic Elements in the Concept 
2.2.1 Spatial Framework The basic elements of the geographic framework of a 


“street address conversion system are derived from the urban landscape and a 
geographic grid system. Only those features that would be vital to the devel- 
. opment of a spatial framework for the referencing of data, and that would pro- 


vide attendant plotting and graphic capabilities are abstracted from the urban 


landscape. It is with reference to these features that the term ‘nominal grid' 


(6) The ‘centroid' is a point location situated at the mid-point of a block-face 
(or any other primary area] unit) and recessed a standard distance from the 
street centre line. Its co-ordinate values serve as both a code attributable 
to all data observations on the block-face, and as a unique location-specific 
identificat:on for those data observations. 
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is used in this study. They may be classified as features of the street pattern 


and as non-street features, such as rivers, railways and area boundaries. 


The epatial framework for the geographic referencing of data in urban 
areas is arrived at through the street pattern.(7) There are two basic elements: 
(i) portions of streets known as block-faces; and (ii) street address ranges for 
each of these (see footnote 5). 

The nominal grid and the geographic grid are brought together by re- 
cording the mid-point or centroid of each block-face in terms of the co-ordinates 
of the geographic grid and relating all elements in a single address conversion 


file. For example: 


aw pa nee ee Block-face Co-ordinates of block-face 
address range mid-point (centroid) 
(X) ) (Y) (Zone) 
ADAM ST 1-19 481,209 | 4,896,212 12 
2-18 : 481,217 4,896,180 12 


The centroid code assigned to data observations by means of an address 
conversion file is the fundamental element of the automated system of geographic 


referencing of data for subsequent storage and retrieval. 


(7) United States' plans along these lines have taken the form of Address Coding 
Guide (ACG) and Dual Independent Map Encoding (DIME) programmes for the 1970 
Census. eve 
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Qe2e2 Plotting and Graphic Capabilities The nominal grid lends itself to 
representation in 4 machine-processable form for both street features and 
non-street features and can, once associated with a geographic grid, be 


! 


susceptible to automated map plotting and to computer mapping (Tobler, 


1959). | | | Ee 


Nominal grid features are represented in a machine-processable form by 
defining them in terms of the geographic grid. Each street feature and non-street 
feature, identified by its name, is coded as a string of points or ‘nodes' that 
identify sequentially its terminals, intersections with other features and abrupt 
changes of direction. Each node, once identified by a geographic grid co-ordinate, 
has a location-specific identification that places the node uniquely within an 
ordered spatial universe and in relative position to all other points described 
therein. Automatic plotting of features is achieved by connecting node strings 


e 


by straight lines on the basis of the node co-ordinate values. 


The co-ordinate location of centroids is calculated using the geographic 
co-ordinate values of the nodes that define the block-faces; thus, both the establish- 
ment of the spatial framework (primary areal units, address ranges, centroids) and 


the definition of features are achieved by means of a4 mutual process. 


The suitability of the geocoding system for general street mapping by 
computer at any specified scale has been demonstrated in programme development. 
The geocoded data base would also lend itself to line-printer and more sophisticated 


computer mapping routines. 


a ESL ahead bry Lasitege ite a 


- " > . _ : ; 7 : : : a 7 

| : aa "ne B42ns 2! Leds y Aran YOR Hy ave ale 
ae : . a - 

Spe COREG tt baqos a 219 Susubsapae ee 


: rer a Ore wi 


‘OWNS? 47 bie gubeiaiees 


7 iIivqli@e] 41° l aS - 
if Fiver’, re | : — 
4 90 (rites? Bag t 
oy 2AANiA i . 
| RNG SlaeGd silo diy aemeet Re 
J ° J j (2 a 
a 4 | 08 a eae 
Poa ©“ '. MiheGe!s j Yeras 
» a ; — 
j : 
‘35 el oeeg hs F 
T- 


°? Feunycaeee 


é 
' 7 


“a Baiuviaw is : re 
Nee i 
aig i) oar 4 


ii voukt. see 


| “ ! ut, 
’ ) 4 ei é e 
* 
on ‘ 
"2 In welg 
= 
. : 
a | 5 1 Qa Pr C4. ldadtiia 7 
¢ iy ‘- ; i 
WOR 9D bide 4-0: boll itege Yaw ¥e ls 
Z rif rf “aly 7 ; 
‘ y is s 


‘het 636% hebo 


Oe? © Bejays sree ; 


othe ch 


3. Geographic Considerations Ensuing from Concept Implementation 


Each of the geographic elements in the street address conversion concept 


_ 


is examined below. 


3.1 The Block-face 

The block-face (see footnote 5) is one of two basic elements of the nominal 
grid; the other Seine the address range. The Pio e ee was identified earlier 
in this paper as the primary Ae Gate oe the spatial framework. Through this. 
identification it is not difficult Sronderecee fon it has also been considered 
as the 'building-block' of the system, whereby through aggregation user-specified 
areas may be constructed. Conceptually, havecaun as will be seen below, the block- 
face is not so much the building-block of the system as is the ‘address range’. 
Two basic functions of the block-face then become those of serving to contain 
physically the address range building-block, and to connote a convenient mental 
image or frame of reference with which the address range can be associated. Later 
_in this paper the repercussions of choosing the block-face as PEA AGERR ERE oy 


unit will be discussed in the context of data retrieval. 


The block-face is eenaps most characterized by its lack of standard- 
ization. In this respect it is not at first glance a very satisfying prospect as 
a primary areal unit for a spatial framework. However, block-faces are generally 
small pfedi units that function adequately in conventional information systems and 
to which administrators, planners and presumably researchers at large can associate 
their information needs. In addition, the image of a block-face is readily pro jec- 


ted, and actual block-faces can be delimited in the field. Figure 1 illustrates 


the variability in block-face dimensions and orientation. 
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EIOURE #. BLOCK-FACE CIMENSIONS AND ORIENTATION 
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Arrows define dlock-foce limits. 


Conceptually, there is nothing sacrosanct about the block-face's integrity. 
A block-face may be "split'' into two or more. portions as long as the dividing line(s) 
is discernible on the nominal grid, and address ranges, which include the statis- 
sai population along each portion, may be allocated. The splitting of a block-face 
must be done before the centroid is calculated. A block-face may be segmented for 
several reasons, but generally to respect a statistical boundary or to divide a 
‘super block-face' into more typical-sized dimensions, or to separate a block-face 
of disparate characteristics into more homogeneous units. There exists, therefore, 
a measure of flexibility in primary areal unit creation (Fig. 2). the DES system 
will split block-faces only in exceptional cases, notably of the super block-face. 


variety, in an effort to preserve the spatial relationships of the nominal grid. 
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FIGURE 2. BLOCK-FACE SPLITS, INTO TWO OR MORE PRIMARY AREAL UNITS 
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Block-face variability is also present in terms of statistical population 
content. Though there may be a Nepateere block-face, statistically, many block- 
faces are heterogeneous in composition (Fig. 3). Nonetheless, it would appear 
that being small in size a block-face exhibits greater homogeneity within it- 


self than it does with respect to a larger area such as a census tract of which 


it forms a part. 


FIGURE 3. HETEROGENEITY OF BLOCK-FACE COMPOSITION 
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One concluding remark about the block-face should be made. The block-face is 
a significant entity within the street address conversion concept only in so far as 
its statistical population can be identified and expressed in terms of an address 
range. A block-face whose content is not expressed by an address range, notably 
vacant land or parks but possibly also dwellings, dogs not pedone a primary areal 
unit within the system. This situation can be rectified by establishing a pseudo- 


address range for the block-face and pseudo-addresses for the block-face components.’ 


3.2 The Centroid 
The centroid has been defined in footnote 6. Though the centroid is 
generally the mid-point of the block-face recessed a standard distance from the 


street centre line, it may also be the recessed mid-point of a segment of arspiac 


block-face. The centroid might better be referred to, conceptually at least, as 


the 'primary area data point' or perhaps ‘coded data point’. 


The nature of the centroid within the system is highly conditioned by 
the characteristics of the nominal grid, notably the street pattern and its block- 


face components. Some centroid characteristics are considered below. 


The number of centroids is equal to the number of primary areal units 


(that is, block-faces and block-face segments) for which address ranges are present. 


The existing system, which calculates a centroid for a primary areal unit in asso- 


ciation with an address range, could be modified to calculate a centroid for each 


° 


and every block-face with or without addresses. 
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The centroid represents the final abstraction of the urban landscape 
into the context of the geographic grid, for purposes of geocoding. In 
etirect “area Wis mapped as a "point", leaving a vacuum surrounding discrete 
points in place of a continuous spatial surface or plane. Each centroid is 
a point, having no areal extent. The co-ordinate values of the centroid are 
simply part of the co-ordinate field of the geographic grid. Vance (1966, p. 31) 
points out that the accuracy of these co-ordinate values is a fun¢tion ‘of the 
map scale available. The centroid is linked to the primary areal unit (be ita 


block-face or a fraction of a block=face) by an address range expression of 


that unit. 


Centroid data content, that is data attributable to the centroid for 
data storage, is variable, and for the same reasons mentioned in Section 3.1. 
As Figure 4 illustrates, the data content of centroids varies both in quan- 


titative and qualitative terms. 


FIGURE 4. VARIATION IN CENTROID DATA CONTENT 
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In Figure 4 centroid 'A' is the code and spatial identification for 8 single- 
family dwellings; centroid 'B' groups two high-rise apartment buildings containing 


200 dwellings;.and lastly, centroid 'C! groups two small apartment buildings and 


three single-family houses for a total of 53 dwellings. - 
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As mentioned above the centroid is the point representation of an area. 
The relgcation of data from its absolute location to a representative point loca- 
tion follows logically. It is diagrammed below (Fig. 5) simply for emphasis, and 


=_ 


because of its significance in a data users' context. 


FIGURE 5. DATA RELOCATION TO THE CENTROID 


iS Martie Teme 
= =[prescin ~ul 


The spacing of centroids reflects naturally the geometry of the nominal grid, 
specifically the street pattern. A regular grid street pattern (assuming address 
ranges Eur oushoue) results in a regular, orderly centroid distribution (see Fig. 6). 

FIGURE 6. CENTROID SPACING ANO PATTERN IN A Le GRID STREET NETWORK 
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In | 
less regular street pattern areas the resultant spacing of centroids 


is less orderly. An example of the distribution of centroids in a curvilinear 


= 


Street area (a type found increasingly in suburban areas and new towns) is 


shown in Figure 7. 


FIGURE 7. CENTROID SPACING AND PATTERN IN A CURVILINEAR STREET NETWORK 
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The density of centroids, that is, the number of centroids per specified | 
areal unit, is not constant, but does appear to exhibit certain tendencies of 


a locational nature. A cursory examination of centroids in one city alone, 


‘London, Ontario, revealed centroid density to be higher in the older central 


part of dhe city, where centroids numbered some 100-120 per square kilometre, 
than in the surrounding built-up residential areas, where some 60-80 centroids 
per square kilometre were calculated. Conceptually, therefore, centroid den- 
sity dade anes with distance from the centre of a city and with the progression. 
from an ordered grid street network to an increasingly curvilinear and random 
street pattern. 


This hypothesis, however, warrants further research in view 


of its implications -to data users. An attempt is made in Figure 8 to illus- 


trate the hypothesis. 
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3.3 The Address Range 


The address range is defined in footnote 5. In the context of a street 
address conversion system address ranges form the fundamental link between 
the nominal grid elements — the primary areal units (block-faces and fractions 


- “of block-faces) and the geographic grid elements — the centroids. 


The address range effectively defines the statistical population that 
will be ascribed to the centroid. It exercises in this capacity a discriminating 
function, permitting only address-bearing data to enter the system. It is 
considered to enclose within its values all of the address numbers of a primary 
areal unit, but nothing is said about the location or distribution of individual 
addresses within that es tes unit. For this reason and the fact -hat many 


block-faces do not contain address ranges, it is more accurate to consider the 


address range (as opposed to the block-face) as the true 'building-block' of 


the system (see Fig..9)- 


= oe 


& / 
FIGURE 9. THE ADDRESS RANGE IN TERMS OF THE BLOCK-FACE 
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Although forming a viable link between the primary areal units and the 
geographic grid, the use of the address range is not without certain disadvan- 


tages, most of which can be overcome. 


In the first place it is necessary to define an address range for each 
‘primary areal unit. Once obtained such address ranges must be maintained up- 
to-date, reflecting changes in the street and addressing pattern as the city 


‘develops. 


A second difficulty arieee from anomalies in addressing systems. 
Though most civic house 5 ane increase sequentially along a block-face, 
a system must contend with even numbers occurring within odd address ranges — 
(and vice versa), and occasionally with civic house numbers that cannot be 


included within a given address range (see Fig. 10). 
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FIGURE 10. ANOMALIES IN ADDRESS SYSTEMS y 
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No difficulty occurs if the addresses are out of sequence, as long as they 


are either all odd or all even and the terminal addresses contain the others. 


A third difficulty presents itself in the case of dwellings (or any 
other statistical population) that do not bear an address. Such a situation, 
it is felt, is infrequent in large urban areas, and can be resolved by providing 


a 'pseudo' address for each occurrence. 


The fact that centroids eee ealculaced for primary areal units 
with which address ranges are associated categorically means that much of an 
urban area is not represented within the system (except for plotting purposes). 
rise! non-represented areas, such as parks, railway yards, cemeteries, parking 


lots, may form a significant part of the actual urban ‘area' (see Fig. 11). 


FIGURE Il. REPRESENTED VERSUS ACTUAL URBAN AREA 
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The disadvantages, mostly operational in nature, of the address range are 
minor in comparison with the major advantage of its widespread use. More speci- 
fically, individual addresses are used extensively as identification information 


for statistical populations by many disparate governmental and private agencies 
and firms. Since any data identified by an address can be accorded a centroid 


value, the data storage design can be ‘open-ended’ to include additional survey 


data. : 


4. Geographic Aspects of Data Storage and Retrieval 
A geocoded census data base may be viewed conceptually as one large storage 


file, though this file may be split into sub-files. 


The data observations derived from the census questionnaire will be kept 
separate for each member of the statistical population. The members of the statis- 
tical Dore eon will be Beebe within the peceoded census data file in terms of 
their position within their primary areal unit. Though kept discrete for purposes 
of cross-tabulation of characteristics, all members of the statistical population 
(and their associated characteristics) are eyo under the centroid of the 
primary areal units in which they are situated. The orderly arrangement of the — 


primary areal units themselves within the census date file will be according to the 


co-ordinate values of their respective centroids. 


As noted above, the geocoded census data file can be viewed conceptually as 
one file. Since this file is to be structured geographically on the basis of the 


co-ordinate values of centroids, it will merge the primary areal units determined 


under the urban system of geocoding with those primary areal units (of a different 


nature) designated within the rural geocoding system. 
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The ordering of primary areal units within the geocoded census data file on 


the basis of their ‘centroids! co-ordinate values has advantages at the data re- 


trieval stage. Some geographic considerations concerning retrieval from a geocoded 


' data base are identified below. 


Any request for data from users must incorporate four elements: statistical 


population, variable(s), time and Space. The geocoding system has been designed 


~ 


primarily to contend with the spatial dimension discernible in any user request. 
In a narrower sense it has been designed to allow for data retrieval by user- 
pecified areas of sub-municipality size. The system seeks to provide tabulations 
for ‘ad oop areas specified by users by manipulating data for finely-gauged pri- 
mary areal units that will aggregate as closely as possible to form the requested 


areas. 5 


User-specified areas are expected to fall into four main categories: 
(i) bounded areas or 'polygons'; (ii) concentric zones or distance bands; (iii) 


street-oriented areas and, (iv) uniform data regions. 


Figure 12 gives an example of a polygon request in which the ‘area specifica- 


tion' called for data tabulation by traffic zones No. 1-10 for a certain le 


FIGURE 12. USER-SPECIFIED AREAS: POLYGONAL 
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Other examples of bounded polygon sets might include: school districts, planning 


neighbourhoods, wards, police districts. 


Figure 13 illustrates a request for data tabulations by concentric zones, 


as defined by radii from a central point. 


FIGURE 13. USER-SPECIFIED AREAS: CONCENTRIC 


Areo Specification: 5 miles ond 10 miles from City Hall 


Figure 14 illustrates the specifications of a request for data for both 


sides of a street between specified intersections. 


FIGURE 14. USER-SPECIFIED AREAS: STREET- ORIENTED 
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(iv) Figure 15 illustrates a form of areal retrieval that may ultimately be 
practicable; i.e. retrieval of tabulated data and regional delimitation based on 
homogeneous characteristics. This form of retrieval, the opposite of predefined 


areas, is discussed briefly by Fellegi and Weldon 1967, p. 55... 


FIGURE I5. USER- SPECIFIED AREAS: UNIFORM DATA 
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The suitability of the block-face and split block-face as primary areal 
units to construct user-specified areas will vary according to the type of area 


specification submitted by the user. 


The block-face will be a suitable primary areal unit in the case of 
bounded user request areas, type (i) above, as long as the boundaries of the 
request areas do not dissect block-faces. If these boundaries cut block-faces 


in places, then the tabulation can only be considered approximate. 


In type (ii) above, concentric zones, the boundary of the user request 


area(s) is a circle, and hence tabulations from block-face primary areal units 


. 


will of necessity be approximate. Requests of this nature generally seek 


only an approximate answer. 
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When users request data for street-oriented areas, type (iii) above, 


block-face aggregation will provide an exact answer. 


Similarly, the request for data by uniform data regions (inverse retrieval) 
acknowledges from the outset the suitability of the block-face primary areal unit — 


since no finer area unit exists to function in that capacity. 


The general approach to data retrieval defines the boundaries of user areas 
in terms of the geographic grid, and subsequent meeteence the file of centroids 
and, using a point-in-polygon geometric algorithm, determines whether these 
centroids according to their co-ordinate values are 'inside' or ‘outside' the defined 
areas. The data coded with the centroids found to lie inside the defined areas 


are tabulated in accordance with the user's specifications. 


One final note on data retrieval should be made. Primary areal units, being 
small and containing a variable statistical population content may present problems, 
even in aggregation, in terms of maintaining confidentiality and/or of exhibiting 
a high sampling and coho aati Ae error. In the former case, suppression of data 
would be required in accordance with the regulations of the Statistics Act and, 


in the latter case, an estimate of such errors might have to be provided with the 


tabulations. 
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5% Summary and Conclusion 


Those elements of the street address conversion mode of geocoding 
for urban areas that might be considered as geographic in nature have been 
identified as the block-face, the centroid and the address range. The block-face 
and the address range have been construed as elements of a nominal grid that 
functions as a spatial framework for referencing data. However, as the nominal 
grid is an imperfect one with which to determine either relative or absolute 
location, it is merged with a geographic grid system by providing shen primary 
areal unit of the nominal grid with a centroid, or colocdinate location within 


the geographic grid. 


The blockers ces block-face centroid and block-face address range elements 
have each been examined in detail in order to describe the role that they play in 
a street address conversion system and to determine the manner in which their 
characteristics effect the attainment of geocoding system objectives. The study 
reveals that the address range may be considered the real building-block of the 
Spatial framework; though the block-face serves to physically contain this range 
and to convey a convenient mental image of it. The centroid, however, is the 
element most fundamental to system operation. Expressed in geographic co-ordinates, 
it serves as a code attributable to data observations for the block-face in ques- 
tion and as a unique location-specific identifier for those observations. The 
Cartesian relationship of centroid co-ordinates permits data for all block-faces 
to be stored in a single randomly accessible data base, for subsequent aggregation 


by block-faces to satisfy user requests for data by non-standard areas. 
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The limitations of each element have also been pointed out. None of these, 


however, would seriously curtail system objectives. 


The following point can be made as a result of this study. The geographic 


elements of the system have been shown to underpin the entire system concept. 


It seems reasonable to conclude that the development of any spatial information 


system should include fundamental research into the nature of the spatial frame- 


work that ultimately will affect the utility of the system. 
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